AITopics | vit model

Collaborating Authors

vit model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ff1418e8cc993fe8abcfe3ce2003e5c5-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 02:07:07 GMT

The table ( right) shows 100 epoch results using best lr and wd values found at 50 epochs. ViT's patchify stem differs from the proposed convolutional stem in the type of convolution used and We investigate these factors next. The focus of this paper is studying the large, positive impact of changing ViT's default We use AdamW for all experiments. Figure 7 shows the results. The table ( right) shows 100 epoch results using optimal lr and wd values chosen from the 50 epoch runs.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

EarlyConvolutionsHelpTransformersSeeBetter

Neural Information Processing SystemsFeb-12-2026, 02:07:04 GMT

This large-kernel plus large-stride convolution runs counter to typical design choices of convolutional layers in neural networks.

artificial intelligence, machine learning, vit, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

3a1fc7b8e200a45110872c56f0569f61-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:37:28 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)
Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(4 more...)

Add feedback

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Neural Information Processing SystemsFeb-10-2026, 22:45:49 GMT

The term resolution refers to the width and height of images input into neural networks.

machine learning, natural language, resolution, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

CAPE

Татьяна Лихоманенко

Neural Information Processing SystemsFeb-9-2026, 16:45:46 GMT

cape, resolution, transformer layer, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Early Convolutions Help Transformers See Better

Neural Information Processing SystemsDec-25-2025, 08:41:56 GMT

In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional neural networks are easier to optimize. Why is this the case? In this work, we conjecture that the issue lies with the patchify stem of ViT models, which is implemented by a stride-p p p convolution (p = 16 by default) applied to the input image. This large-kernel plus large-stride convolution runs counter to typical design choices of convolutional layers in neural networks.

electronic proceedings, name change, vit model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Neural Information Processing SystemsDec-25-2025, 02:00:50 GMT

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments. Thus, it is necessary to adapt ViT architectures to devices with diverse computational overheads to achieve an accuracy-efficient trade-off. This concept is consistent with the motivation behind Learngene. To achieve this, inspired by polynomial decomposition in calculus, where a function can be approximated by linearly combining several basic components, we propose to linearly decompose the ViT model into a set of components called learngenes during element-wise training. These learngenes can then be recomposed into differently scaled, pre-initialized models to satisfy different computational resource constraints. Such a decomposition-recomposition strategy provides an economical and flexible approach to generating different scales of ViT models for different deployment scenarios. Compared to model compression or training from scratch, which require to repeatedly train on large datasets for diverse-scale models, such strategy reduces computational costs since it only requires to train on large datasets once. Extensive experiments are used to validate the effectiveness of our method: ViTs can be decomposed and the decomposed learngenes can be recomposed into diverse-scale ViTs, which can achieve comparable or better performance compared to traditional model compression and pre-training methods. The code for our experiments is available in the supplemental material.

artificial intelligence, decomposing and recomposing vision transformer, linearly decomposing, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback

Data-Efficient Realized Volatility Forecasting with Vision Transformers

Soroka, Emi, Arzyn, Artem

arXiv.org Artificial IntelligenceNov-6-2025

Recent work in financial machine learning has shown the virtue of complexity: the phenomenon by which deep learning methods capable of learning highly nonlinear relationships outperform simpler approaches in financial forecasting. While transformer architectures like Informer have shown promise for financial time series forecasting, the application of transformer models for options data remains largely unexplored. We conduct preliminary studies towards the development of a transformer model for options data by training the Vision Transformer (ViT) architecture, typically used in modern image recognition and classification systems, to predict the realized volatility of an asset over the next 30 days from its implied volatility surface (augmented with date information) for a single day. We show that the ViT can learn seasonal patterns and nonlinear features from the IV surface, suggesting a promising direction for model development.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.03046

Country: